import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
C:\Users\gauri\anaconda3\Lib\site-packages\pandas\core\arrays\masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed). from pandas.core import (
netflix=pd.read_csv("D:\\Python\\python-projects\\netflix-EDA\\archive (8)\\netflix_titles_2021.csv")
netflix
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | s1 | Movie | Dick Johnson Is Dead | Kirsten Johnson | NaN | United States | September 25, 2021 | 2020 | PG-13 | 90 min | Documentaries | As her father nears the end of his life, filmm... |
| 1 | s2 | TV Show | Blood & Water | NaN | Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... | South Africa | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, TV Dramas, TV Mysteries | After crossing paths at a party, a Cape Town t... |
| 2 | s3 | TV Show | Ganglands | Julien Leclercq | Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Crime TV Shows, International TV Shows, TV Act... | To protect his family from a powerful drug lor... |
| 3 | s4 | TV Show | Jailbirds New Orleans | NaN | NaN | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Docuseries, Reality TV | Feuds, flirtations and toilet talk go down amo... |
| 4 | s5 | TV Show | Kota Factory | NaN | Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... | India | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, Romantic TV Shows, TV ... | In a city of coaching centers known to train I... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 8802 | s8803 | Movie | Zodiac | David Fincher | Mark Ruffalo, Jake Gyllenhaal, Robert Downey J... | United States | November 20, 2019 | 2007 | R | 158 min | Cult Movies, Dramas, Thrillers | A political cartoonist, a crime reporter and a... |
| 8803 | s8804 | TV Show | Zombie Dumb | NaN | NaN | NaN | July 1, 2019 | 2018 | TV-Y7 | 2 Seasons | Kids' TV, Korean TV Shows, TV Comedies | While living alone in a spooky town, a young g... |
| 8804 | s8805 | Movie | Zombieland | Ruben Fleischer | Jesse Eisenberg, Woody Harrelson, Emma Stone, ... | United States | November 1, 2019 | 2009 | R | 88 min | Comedies, Horror Movies | Looking to survive in a world taken over by zo... |
| 8805 | s8806 | Movie | Zoom | Peter Hewitt | Tim Allen, Courteney Cox, Chevy Chase, Kate Ma... | United States | January 11, 2020 | 2006 | PG | 88 min | Children & Family Movies, Comedies | Dragged from civilian life, a former superhero... |
| 8806 | s8807 | Movie | Zubaan | Mozez Singh | Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan... | India | March 2, 2019 | 2015 | TV-14 | 111 min | Dramas, International Movies, Music & Musicals | A scrappy but poor boy worms his way into a ty... |
8807 rows × 12 columns
#show top 5 rows
netflix.head(5)
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | s1 | Movie | Dick Johnson Is Dead | Kirsten Johnson | NaN | United States | September 25, 2021 | 2020 | PG-13 | 90 min | Documentaries | As her father nears the end of his life, filmm... |
| 1 | s2 | TV Show | Blood & Water | NaN | Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... | South Africa | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, TV Dramas, TV Mysteries | After crossing paths at a party, a Cape Town t... |
| 2 | s3 | TV Show | Ganglands | Julien Leclercq | Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Crime TV Shows, International TV Shows, TV Act... | To protect his family from a powerful drug lor... |
| 3 | s4 | TV Show | Jailbirds New Orleans | NaN | NaN | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Docuseries, Reality TV | Feuds, flirtations and toilet talk go down amo... |
| 4 | s5 | TV Show | Kota Factory | NaN | Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... | India | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, Romantic TV Shows, TV ... | In a city of coaching centers known to train I... |
#show bottom 5
netflix.tail(5)
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8802 | s8803 | Movie | Zodiac | David Fincher | Mark Ruffalo, Jake Gyllenhaal, Robert Downey J... | United States | November 20, 2019 | 2007 | R | 158 min | Cult Movies, Dramas, Thrillers | A political cartoonist, a crime reporter and a... |
| 8803 | s8804 | TV Show | Zombie Dumb | NaN | NaN | NaN | July 1, 2019 | 2018 | TV-Y7 | 2 Seasons | Kids' TV, Korean TV Shows, TV Comedies | While living alone in a spooky town, a young g... |
| 8804 | s8805 | Movie | Zombieland | Ruben Fleischer | Jesse Eisenberg, Woody Harrelson, Emma Stone, ... | United States | November 1, 2019 | 2009 | R | 88 min | Comedies, Horror Movies | Looking to survive in a world taken over by zo... |
| 8805 | s8806 | Movie | Zoom | Peter Hewitt | Tim Allen, Courteney Cox, Chevy Chase, Kate Ma... | United States | January 11, 2020 | 2006 | PG | 88 min | Children & Family Movies, Comedies | Dragged from civilian life, a former superhero... |
| 8806 | s8807 | Movie | Zubaan | Mozez Singh | Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan... | India | March 2, 2019 | 2015 | TV-14 | 111 min | Dramas, International Movies, Music & Musicals | A scrappy but poor boy worms his way into a ty... |
#to show the total number of columns and row
netflix.shape
(8807, 12)
dataset contain 8807 rows and 12 columns
#to show each column
netflix.columns
Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
'release_year', 'rating', 'duration', 'listed_in', 'description'],
dtype='object')
#to show data types of each columns
netflix.dtypes
show_id object type object title object director object cast object country object date_added object release_year int64 rating object duration object listed_in object description object dtype: object
netflix.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 8807 entries, 0 to 8806 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 show_id 8807 non-null object 1 type 8807 non-null object 2 title 8807 non-null object 3 director 6173 non-null object 4 cast 7982 non-null object 5 country 7976 non-null object 6 date_added 8797 non-null object 7 release_year 8807 non-null int64 8 rating 8803 non-null object 9 duration 8804 non-null object 10 listed_in 8807 non-null object 11 description 8807 non-null object dtypes: int64(1), object(11) memory usage: 825.8+ KB
#statistical information
netflix.describe()
| release_year | |
|---|---|
| count | 8807.000000 |
| mean | 2014.180198 |
| std | 8.819312 |
| min | 1925.000000 |
| 25% | 2013.000000 |
| 50% | 2017.000000 |
| 75% | 2019.000000 |
| max | 2021.000000 |
netflix.describe(include='all')
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 8807 | 8807 | 8807 | 6173 | 7982 | 7976 | 8797 | 8807.000000 | 8803 | 8804 | 8807 | 8807 |
| unique | 8807 | 2 | 8807 | 4528 | 7692 | 748 | 1767 | NaN | 17 | 220 | 514 | 8775 |
| top | s1 | Movie | Dick Johnson Is Dead | Rajiv Chilaka | David Attenborough | United States | January 1, 2020 | NaN | TV-MA | 1 Season | Dramas, International Movies | Paranormal activity at a lush, abandoned prope... |
| freq | 1 | 6131 | 1 | 19 | 19 | 2818 | 109 | NaN | 3207 | 1793 | 362 | 4 |
| mean | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2014.180198 | NaN | NaN | NaN | NaN |
| std | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 8.819312 | NaN | NaN | NaN | NaN |
| min | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1925.000000 | NaN | NaN | NaN | NaN |
| 25% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2013.000000 | NaN | NaN | NaN | NaN |
| 50% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2017.000000 | NaN | NaN | NaN | NaN |
| 75% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2019.000000 | NaN | NaN | NaN | NaN |
| max | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021.000000 | NaN | NaN | NaN | NaN |
#finding how many unique values are in dataset
netflix.nunique()
show_id 8807 type 2 title 8807 director 4528 cast 7692 country 748 date_added 1767 release_year 74 rating 17 duration 220 listed_in 514 description 8775 dtype: int64
#check null values
netflix.isnull()
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | False | False | False | False | True | False | False | False | False | False | False | False |
| 1 | False | False | False | True | False | False | False | False | False | False | False | False |
| 2 | False | False | False | False | False | True | False | False | False | False | False | False |
| 3 | False | False | False | True | True | True | False | False | False | False | False | False |
| 4 | False | False | False | True | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 8802 | False | False | False | False | False | False | False | False | False | False | False | False |
| 8803 | False | False | False | True | True | True | False | False | False | False | False | False |
| 8804 | False | False | False | False | False | False | False | False | False | False | False | False |
| 8805 | False | False | False | False | False | False | False | False | False | False | False | False |
| 8806 | False | False | False | False | False | False | False | False | False | False | False | False |
8807 rows × 12 columns
#to show the count of null values
netflix.isnull().sum()
show_id 0 type 0 title 0 director 2634 cast 825 country 831 date_added 10 release_year 0 rating 4 duration 3 listed_in 0 description 0 dtype: int64
miss=netflix.isnull().sum() miss
miss1=(netflix.isnull().sum()/len(netflix))*100
miss1
show_id 0.000000 type 0.000000 title 0.000000 director 29.908028 cast 9.367549 country 9.435676 date_added 0.113546 release_year 0.000000 rating 0.045418 duration 0.034064 listed_in 0.000000 description 0.000000 dtype: float64
miss=netflix.isnull().sum()
miss
show_id 0 type 0 title 0 director 2634 cast 825 country 831 date_added 10 release_year 0 rating 4 duration 3 listed_in 0 description 0 dtype: int64
#missing values with percent
m=pd.concat([miss,miss1],axis=1,keys=['total','missing%'])
m
| total | missing% | |
|---|---|---|
| show_id | 0 | 0.000000 |
| type | 0 | 0.000000 |
| title | 0 | 0.000000 |
| director | 2634 | 29.908028 |
| cast | 825 | 9.367549 |
| country | 831 | 9.435676 |
| date_added | 10 | 0.113546 |
| release_year | 0 | 0.000000 |
| rating | 4 | 0.045418 |
| duration | 3 | 0.034064 |
| listed_in | 0 | 0.000000 |
| description | 0 | 0.000000 |
#using heat map to show null values
sns.heatmap(netflix.isnull())
<Axes: >
From the above output we can see that director , cast ,country columns contains maximum null values. We will see how to deal with them.
So, We Delete director and cast columns because they are not going to use those features right now.
#making copy of dataset to make changes
netflix_copy=netflix.copy()
netflix_copy
netflix_copy.head(5)
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | s1 | Movie | Dick Johnson Is Dead | Kirsten Johnson | NaN | United States | September 25, 2021 | 2020 | PG-13 | 90 min | Documentaries | As her father nears the end of his life, filmm... |
| 1 | s2 | TV Show | Blood & Water | NaN | Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... | South Africa | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, TV Dramas, TV Mysteries | After crossing paths at a party, a Cape Town t... |
| 2 | s3 | TV Show | Ganglands | Julien Leclercq | Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Crime TV Shows, International TV Shows, TV Act... | To protect his family from a powerful drug lor... |
| 3 | s4 | TV Show | Jailbirds New Orleans | NaN | NaN | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Docuseries, Reality TV | Feuds, flirtations and toilet talk go down amo... |
| 4 | s5 | TV Show | Kota Factory | NaN | Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... | India | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, Romantic TV Shows, TV ... | In a city of coaching centers known to train I... |
#droping na values of director and cast
netflix_copy=netflix_copy.dropna(how='any',subset=['director','cast'])
netflix_copy
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | s3 | TV Show | Ganglands | Julien Leclercq | Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Crime TV Shows, International TV Shows, TV Act... | To protect his family from a powerful drug lor... |
| 5 | s6 | TV Show | Midnight Mass | Mike Flanagan | Kate Siegel, Zach Gilford, Hamish Linklater, H... | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | TV Dramas, TV Horror, TV Mysteries | The arrival of a charismatic young priest brin... |
| 6 | s7 | Movie | My Little Pony: A New Generation | Robert Cullen, José Luis Ucha | Vanessa Hudgens, Kimiko Glenn, James Marsden, ... | NaN | September 24, 2021 | 2021 | PG | 91 min | Children & Family Movies | Equestria's divided. But a bright-eyed hero be... |
| 7 | s8 | Movie | Sankofa | Haile Gerima | Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D... | United States, Ghana, Burkina Faso, United Kin... | September 24, 2021 | 1993 | TV-MA | 125 min | Dramas, Independent Movies, International Movies | On a photo shoot in Ghana, an American model s... |
| 8 | s9 | TV Show | The Great British Baking Show | Andy Devonshire | Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho... | United Kingdom | September 24, 2021 | 2021 | TV-14 | 9 Seasons | British TV Shows, Reality TV | A talented batch of amateur bakers face off in... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 8801 | s8802 | Movie | Zinzana | Majid Al Ansari | Ali Suliman, Saleh Bakri, Yasa, Ali Al-Jabri, ... | United Arab Emirates, Jordan | March 9, 2016 | 2015 | TV-MA | 96 min | Dramas, International Movies, Thrillers | Recovering alcoholic Talal wakes up inside a s... |
| 8802 | s8803 | Movie | Zodiac | David Fincher | Mark Ruffalo, Jake Gyllenhaal, Robert Downey J... | United States | November 20, 2019 | 2007 | R | 158 min | Cult Movies, Dramas, Thrillers | A political cartoonist, a crime reporter and a... |
| 8804 | s8805 | Movie | Zombieland | Ruben Fleischer | Jesse Eisenberg, Woody Harrelson, Emma Stone, ... | United States | November 1, 2019 | 2009 | R | 88 min | Comedies, Horror Movies | Looking to survive in a world taken over by zo... |
| 8805 | s8806 | Movie | Zoom | Peter Hewitt | Tim Allen, Courteney Cox, Chevy Chase, Kate Ma... | United States | January 11, 2020 | 2006 | PG | 88 min | Children & Family Movies, Comedies | Dragged from civilian life, a former superhero... |
| 8806 | s8807 | Movie | Zubaan | Mozez Singh | Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan... | India | March 2, 2019 | 2015 | TV-14 | 111 min | Dramas, International Movies, Music & Musicals | A scrappy but poor boy worms his way into a ty... |
5700 rows × 12 columns
#filling missing values of country,rating,duartion by 'missing'
netflix_copy=netflix_copy.fillna({'country':'missing','duration':'missing','rating':'missing'})
netflix_copy.head(5)
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | s3 | TV Show | Ganglands | Julien Leclercq | Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... | missing | September 24, 2021 | 2021 | TV-MA | 1 Season | Crime TV Shows, International TV Shows, TV Act... | To protect his family from a powerful drug lor... |
| 5 | s6 | TV Show | Midnight Mass | Mike Flanagan | Kate Siegel, Zach Gilford, Hamish Linklater, H... | missing | September 24, 2021 | 2021 | TV-MA | 1 Season | TV Dramas, TV Horror, TV Mysteries | The arrival of a charismatic young priest brin... |
| 6 | s7 | Movie | My Little Pony: A New Generation | Robert Cullen, José Luis Ucha | Vanessa Hudgens, Kimiko Glenn, James Marsden, ... | missing | September 24, 2021 | 2021 | PG | 91 min | Children & Family Movies | Equestria's divided. But a bright-eyed hero be... |
| 7 | s8 | Movie | Sankofa | Haile Gerima | Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D... | United States, Ghana, Burkina Faso, United Kin... | September 24, 2021 | 1993 | TV-MA | 125 min | Dramas, Independent Movies, International Movies | On a photo shoot in Ghana, an American model s... |
| 8 | s9 | TV Show | The Great British Baking Show | Andy Devonshire | Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho... | United Kingdom | September 24, 2021 | 2021 | TV-14 | 9 Seasons | British TV Shows, Reality TV | A talented batch of amateur bakers face off in... |
netflix_copy.isnull().sum()
show_id 0 type 0 title 0 director 0 cast 0 country 0 date_added 0 release_year 0 rating 0 duration 0 listed_in 0 description 0 dtype: int64
pip install -U ydata-profiling
Requirement already satisfied: ydata-profiling in c:\users\gauri\anaconda3\lib\site-packages (4.10.0) Requirement already satisfied: scipy<1.14,>=1.4.1 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (1.10.1) Requirement already satisfied: pandas!=1.4.0,<3,>1.1 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (2.2.2) Requirement already satisfied: matplotlib<3.10,>=3.5 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (3.7.1) Requirement already satisfied: pydantic>=2 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (2.9.2) Requirement already satisfied: PyYAML<6.1,>=5.0.0 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (6.0) Requirement already satisfied: jinja2<3.2,>=2.11.1 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (3.1.2) Requirement already satisfied: visions[type_image_path]<0.7.7,>=0.7.5 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (0.7.6) Requirement already satisfied: numpy<2.2,>=1.16.0 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (1.24.3) Requirement already satisfied: htmlmin==0.1.12 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (0.1.12) Requirement already satisfied: phik<0.13,>=0.11.1 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (0.12.3) Requirement already satisfied: requests<3,>=2.24.0 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (2.31.0) Requirement already satisfied: tqdm<5,>=4.48.2 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (4.65.0) Requirement already satisfied: seaborn<0.14,>=0.10.1 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (0.12.2) Requirement already satisfied: multimethod<2,>=1.4 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (1.10) Requirement already satisfied: statsmodels<1,>=0.13.2 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (0.14.0) Requirement already satisfied: typeguard<5,>=3 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (4.3.0) Requirement already satisfied: imagehash==4.3.1 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (4.3.1) Requirement already satisfied: wordcloud>=1.9.3 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (1.9.3) Requirement already satisfied: dacite>=1.8 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (1.8.1) Requirement already satisfied: numba<1,>=0.56.0 in c:\users\gauri\anaconda3\lib\site-packages (from ydata-profiling) (0.57.0) Requirement already satisfied: PyWavelets in c:\users\gauri\anaconda3\lib\site-packages (from imagehash==4.3.1->ydata-profiling) (1.4.1) Requirement already satisfied: pillow in c:\users\gauri\anaconda3\lib\site-packages (from imagehash==4.3.1->ydata-profiling) (9.4.0) Requirement already satisfied: MarkupSafe>=2.0 in c:\users\gauri\anaconda3\lib\site-packages (from jinja2<3.2,>=2.11.1->ydata-profiling) (2.1.1) Requirement already satisfied: contourpy>=1.0.1 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (1.0.5) Requirement already satisfied: cycler>=0.10 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (4.25.0) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (1.4.4) Requirement already satisfied: packaging>=20.0 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (23.0) Requirement already satisfied: pyparsing>=2.3.1 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (3.0.9) Requirement already satisfied: python-dateutil>=2.7 in c:\users\gauri\anaconda3\lib\site-packages (from matplotlib<3.10,>=3.5->ydata-profiling) (2.8.2) Requirement already satisfied: llvmlite<0.41,>=0.40.0dev0 in c:\users\gauri\anaconda3\lib\site-packages (from numba<1,>=0.56.0->ydata-profiling) (0.40.0) Requirement already satisfied: pytz>=2020.1 in c:\users\gauri\anaconda3\lib\site-packages (from pandas!=1.4.0,<3,>1.1->ydata-profiling) (2022.7) Requirement already satisfied: tzdata>=2022.7 in c:\users\gauri\anaconda3\lib\site-packages (from pandas!=1.4.0,<3,>1.1->ydata-profiling) (2024.1) Requirement already satisfied: joblib>=0.14.1 in c:\users\gauri\anaconda3\lib\site-packages (from phik<0.13,>=0.11.1->ydata-profiling) (1.1.1) Requirement already satisfied: annotated-types>=0.6.0 in c:\users\gauri\anaconda3\lib\site-packages (from pydantic>=2->ydata-profiling) (0.6.0) Requirement already satisfied: pydantic-core==2.23.4 in c:\users\gauri\anaconda3\lib\site-packages (from pydantic>=2->ydata-profiling) (2.23.4) Requirement already satisfied: typing-extensions>=4.6.1 in c:\users\gauri\anaconda3\lib\site-packages (from pydantic>=2->ydata-profiling) (4.12.2) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\gauri\anaconda3\lib\site-packages (from requests<3,>=2.24.0->ydata-profiling) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\gauri\anaconda3\lib\site-packages (from requests<3,>=2.24.0->ydata-profiling) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\gauri\anaconda3\lib\site-packages (from requests<3,>=2.24.0->ydata-profiling) (1.26.16) Requirement already satisfied: certifi>=2017.4.17 in c:\users\gauri\anaconda3\lib\site-packages (from requests<3,>=2.24.0->ydata-profiling) (2023.7.22) Requirement already satisfied: patsy>=0.5.2 in c:\users\gauri\anaconda3\lib\site-packages (from statsmodels<1,>=0.13.2->ydata-profiling) (0.5.3) Requirement already satisfied: colorama in c:\users\gauri\anaconda3\lib\site-packages (from tqdm<5,>=4.48.2->ydata-profiling) (0.4.6) Requirement already satisfied: attrs>=19.3.0 in c:\users\gauri\anaconda3\lib\site-packages (from visions[type_image_path]<0.7.7,>=0.7.5->ydata-profiling) (22.1.0) Requirement already satisfied: networkx>=2.4 in c:\users\gauri\anaconda3\lib\site-packages (from visions[type_image_path]<0.7.7,>=0.7.5->ydata-profiling) (3.1) Requirement already satisfied: six in c:\users\gauri\anaconda3\lib\site-packages (from patsy>=0.5.2->statsmodels<1,>=0.13.2->ydata-profiling) (1.16.0) Note: you may need to restart the kernel to use updated packages.
import ydata_profiling as prf
# Assuming `netflix` is your DataFrame
netflix_profile = prf.ProfileReport(netflix)
netflix_profile
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]
netflix_profile.to_file(output_file="netflix21_before_preprocessing.html")
C:\Users\gauri\anaconda3\Lib\site-packages\ydata_profiling\profile_report.py:363: UserWarning: Try running command: 'pip install --upgrade Pillow' to avoid ValueError warnings.warn(
Export report to file: 0%| | 0/1 [00:00<?, ?it/s]
#to show duplicate rows
netflix[netflix.duplicated()]
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description |
|---|
#to show the count of duplicate rows
netflix.duplicated().sum()
0
#check the sixe after cleaning data set
netflix_copy.shape
(5700, 12)
#save netflix copy to csv
netflix_copy.to_csv('netflix_clean.csv')
EDA of different questions
netflix.nunique()
show_id 8807 type 2 title 8807 director 4528 cast 7692 country 748 date_added 1767 release_year 74 rating 17 duration 220 listed_in 514 description 8775 dtype: int64
netflix_copy.groupby('type')['title'].count()
type Movie 5522 TV Show 178 Name: title, dtype: int64
there are 5522 types of movies and 178 types of TV Show
2.Most watched shows on the Netflix?
netflix_copy.type.value_counts().to_frame('value_count')
| value_count | |
|---|---|
| type | |
| Movie | 5522 |
| TV Show | 178 |
sns.countplot(x=netflix_copy['type'])
<Axes: xlabel='type', ylabel='count'>
we can see that here moves are more watched as compare to tv shows on netflix
value_count=[5522,178]
type_show=['movies','TV show']
plt.pie(value_count,labels=type_show,autopct="%2.2f%%")
plt.show()
3 what are different types of rating defined by netflix
netflix_copy['rating'].nunique()
18
sns.countplot(x=netflix_copy['rating'])
plt.xticks(rotation=90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17]),
[Text(0, 0, 'TV-MA'),
Text(1, 0, 'PG'),
Text(2, 0, 'TV-14'),
Text(3, 0, 'PG-13'),
Text(4, 0, 'TV-PG'),
Text(5, 0, 'TV-Y'),
Text(6, 0, 'R'),
Text(7, 0, 'TV-G'),
Text(8, 0, 'TV-Y7'),
Text(9, 0, 'G'),
Text(10, 0, 'NC-17'),
Text(11, 0, '74 min'),
Text(12, 0, '84 min'),
Text(13, 0, '66 min'),
Text(14, 0, 'NR'),
Text(15, 0, 'TV-Y7-FV'),
Text(16, 0, 'UR'),
Text(17, 0, 'missing')])
Audiance prefer mostly TV-MA & TV-14 and less preference NC-17 as rating
there are total 18 types of rating on netflix
4 Show only the title of all TV shows that were released in India only.
netflix[(netflix['type']== 'TV Show') & (netflix['country']=='India')]['title']
4 Kota Factory
39 Chhota Bheem
50 Dharmakshetra
66 Raja Rasoi Aur Anya Kahaniyan
69 Stories by Rabindranath Tagore
...
8173 Thackeray
8235 The Calling
8321 The Golden Years with Javed Akhtar
8349 The House That Made Me
8775 Yeh Meri Family
Name: title, Length: 79, dtype: object
netflix[(netflix['type']== 'TV Show') & (netflix['country']=='India')]['title'].count()
79
there are total 79 tvshows that were release in india only
4.Show top 10 director, who gave the highest number of TV shows & Movies to Netflix?
netflix['director'].value_counts().head(10)
director Rajiv Chilaka 19 Raúl Campos, Jan Suter 18 Marcus Raboy 16 Suhas Kadav 16 Jay Karas 14 Cathy Garcia-Molina 13 Martin Scorsese 12 Youssef Chahine 12 Jay Chapman 12 Steven Spielberg 11 Name: count, dtype: int64
netflix['director'].value_counts().head(10).plot(kind='bar')
<Axes: xlabel='director'>
5.How many movies got the "TV-14" rating in the caneda?
netflix[(netflix['type'] == 'Movie') & (netflix['rating'] == 'TV-14') & (netflix['country']=='Canada')].shape
(13, 12)
There are 13 movies got the "TV-14" rating in the caneda.
insights based on EDA:
1)there are 5522 types of movies and 178 types of TV Show uploded on netflix
2)we can see that here moves are more watched as compare to tv shows on netflix
3)there are total 18 types of rating on netflix. Audiance prefer mostly TV-MA & TV-14 and less preference is for NC-17 as per rating
4)There are 13 movies got the "TV-14" rating in the caneda.